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ABSTRACT 

In  earlier  work  (Gelfand  and  Smith,  1990  and  Gelfand  et  al, 
1989)  a  sampling  based  approach  using  the  Gibbs  sampler  was  offered 
as  a  means  for  developing  marginal  posterior  densities  for  a  wide 
range  of  Bayesian  problems  several  of  which  were  previously 
inaccessible.  Our  purpose  here  is  two-fold.  First  we  flesh  out 
the  implementation  of  this  approach  for  calculation  of  arbitrary 
expectations  of  interest.  Secondly  we  offer  comparison  with 
perhaps  the  most  prominent  approach  for  calculating  posterior 
expectations,  analytic  approximation  involving  application  of  the 
LaPlace  method.  Several  illustrative  examples  are  discussed  as 
well.  Clear  advantages  for  the  sampling  based  approach  emerge. 
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1.  Introduction 

.An  important  issue  in  Bayesian  inference  is  the  calculation  of  marginal  posterior 
expectations.  Let  [Y|0)  denote  the  likelihood  where  Y  is  the  observed  data  and  0  is  a 
vector  of  unobservable  random  parameters.  Specifying  a  prior  distribution  for  0  yields 
the  marginal  posterior  distribution  for  0  denoted  by  [0|  Y].  Interest  typically  centers  on 
calculating  the  marginal  posterior  distribution  itself  along  with  associated  expectations 
such  as  E(0i|Y),  E(0^jlY),  E(0i0j|Y),  E(la,b(0i|  Y))  where  la,b  denotes  the  indicator 
function  of  the  interval  (a,b).  In  many  applications  functions  of  0,  g(0)  are  of  interest 
e.g.,  0i-0j,  0i/0j,  max  0i,  E(W(Y)|0)  (for  an  appropriate  VV)  where  again  the  marginal 
posterior  distribution  and  associated  expectations  would  be  sought. 

The  technical  problems  encountered  in  attempting  to  carry  out  the  required 
numerical  integrations  to  obtain  these  distributions  and  expectations  have  long  served  as 
an  impediment  to  the  wider  application  of  the  Bayesian  framework  to  real  data. 

In  two  recent  papers  (Gelfand  and  Smith,  1990,  Gelfand  et  al  1989)  the  Gibbs 
sampler  was  discussed  in  the  context  of  calculation  of  marginal  posterior  densities  for  a 
wide  range  of  Bayesian  problems  several  of  which  were  previously  inaccessible.  The  Gibbs 
sampler  offers  conceptual  simplicity  and  straightforward  implementation  avoiding 
sophisticated  numerical  or  analytic  approximation  expertise  and  associated  specialist 
software  (as  in  e.g.  Naylor  and  Smith  1982,  1988,  Smith  et  al  1985,  1987.  Shaw  1988. 


Gewoke  1988).  The  Gibbs  sampler  was  proposed  for  and  has  been  implemented  for  v(>ry 
high  dimensional  problems  (Geman  and  Geman.  1984)  e.g.  10^  or  more  variables.  By 
comparison  even  a  complicated  Bayesian  application  would  usually  have  low  dimension. 
Thus  in  our  applications  this  sampler  has  proved  very  efficient  converging  remarkably 
quickly.  Indeed  we  need  not  compromise  high  dimensional  integration  by  replacing 
integration  with  estimation  as  is  often  done  in  practice. 

The  purpose  of  this  note  is  to  show  that  not  only  does  the  Gibbs  sampler  enable 
calculation  of  marginal  posterior  densities  but  it  also  enables  routine  calculation  of 
expectations  such  as  those  described  above. 

Since  our  emphasis  here  is  on  expectations,  in  the  next  section  we  review  existing 
approaches  both  analytic  and  numerical.  In  the  third  section  we  briefly  review  the  Gibbs 
sampler  followed  by  the  development  of  its  application  to  calculating  expectations.  In 
Section  4  we  provide  a  variety  of  illustrative  applications.  We  conclude  in  Section  5  with  a 
brief  summary. 

2.  Ebdsting  Approaches  for  Posterior  Expectations 

With  regard  to  performing  the  required  numerical  integrations  directly  there  have 
t)een  several  recent  advances  (see  e.g.  Naylor  and  Smith  1982.  1988,  Smith  ei  al  1985. 
1987).  .As  noted  in  the  introduction,  implementation  requires  both  numerical  expertise  and 
highly  sophisticated  software.  These  approaches  employ  quadrature  methods,  are  typically 
most  successful  in  integrating  functions  which  are  of  the  form  "polynomial  *  normal 
density"  and  will  usually  not  accommodate  more  than  six  dimensions.  (Of  course 
insightful  manipulation  of  joint  distributions  can  frequently  reduce  higher  dimensional 
problems  to  forms  requiring  at  most  six  dimensional  integration). 

An  alternative  direction  for  numerical  integration  has  been  through  Monte  Carlo 
methods  as  in  the  work  of  Stewart  (1983,  1984),  Van  Dijk  and  Kloek  (1980,  1984)  and 
Shaw  (1988)  perhaps  in  its  most  refined  form  using  importance  sampling  and  variance 


reduction  techniques  in  recent  work  of  Geweke  (1988,  1989).  Recently  Rubin  (1987.  1988) 
has  proposed  a  sampling/importance  resampling  algorithm  which  in  the  context  of 
hierarchical  models  affords  the  advantage  of  simpler  specification  of  the  importance 
sampling  density.  Again  these  approaches  encounter  difficulty  with  high  dimensionality. 
Moreover  our  empirical  experience  (see,  for  example,  Gelfand  and  Smith,  1990)  suggests 
that  the  "learning"  which  is  inherent  in  an  iterative  algorithm  such  as  the  Gibbs  sampler 
makes  such  an  approach  more  efficient  in  terms  of  amount  of  random  generation  than 
one-off  (non  iterative)  Monte  Carlo  methods. 

Substantial  effort  has  been  devoted  to  the  development  of  analytic  approximations 
for  calculating  expectations.  Most  of  this  work  involves  application  of  Laplace's  method 
(see  e.g.  DeBruijn,  1961).  Writing  these  expectations  as  a  ratio  of  integrals  and 
approximating  numerator  and  denominator  separately  Tierney  and  Kadane  (1986)  obtain  a 
second  order  approximation  for  the  expectation  of  positive  functions.  Other  second  order 
expansions  (Hartigan  1965,  Johnson  1970,  Lindley  1961,  1980)  require  computation  of  more 
derivatives  of  the  log  likelihood  function.  Extension  of  the  Tierney-Kadane  approach  to 
nonpositive  functions  is  discussed  in  Tierney,  Kass  and  Kadane  (1989a).  Kass  and  Steffey 
(1988)  study  these  approximations  in  the  context  of  conditionally  independent  hierarchical 
models.  Density  approximations  based  on  the  Laplace  method  appear  in  Tierney  and 
Kadane  (1986)  with  extensions  in  Tierney,  Kass  and  Kadane  (1989b).  A  related 
approximation  appears  in  Hsu,  Leonard  and  Tsui  (1987).  Work  of  Achcar  and  Smith 
(1989)  shows  that  performance  of  the  Laplace  method  is  often  very  sensitive  to 
parametrization.  Morris  (1988)  offers  expansions  based  on  Pearson  family  kernels  rather 
than  the  normal  kernels  used  in  the  Laplace  method. 

Apart  from  the  sometimes  severe  sensitivity  to  parametrization  (see  Section  4)  these 
approaches  require  at  least  one  and  usually  two  function  maximizations  which  often 
demand  expertise  and  are  at  best  cumbersome.  in  addition,  these  functions  nust 
be  assumed  twice  differentiable-  Moreover  a  separate  function  maximization 


is  required  for  each  expectation  sought.  These  problems  are  magnified  with  increasing 
dimensionality. 

Since  the  Laplace  method  has  received  much  attention  of  late  its  performance  along 
with  that  of  the  Gibbs  sampler  will  be  investigated  in  Section  4.  We  thus  Driefly  review 
the  method  encouraging  the  reader  to  consult  Kass.  Tierney  and  Kadane  (1988)  for  a  fuller 
account. 

Consider  a  set  of  random  variables  U  =  (Ui,-  •  -  .Ut).  Our  interest  is  in  calculating 
E  f(U).  Suppose  f>0  and  suppose  the  joint  density  of  Ui,---.Uk  is  only  knowm  modulo 
normalizing  constant  i.e.  is  proportional  to  g(Ui,- •  •  ,Uk).  Then  E(f)  = 


/g 


e/(y)-r(U*) 


(1) 


where  i  =  -log  g,  T  =  /-log  f,  U  is  the  mode  of  /,  U*  is  the  mode  of  /*.  T.  and  are 
minus  the  inverse  Hessians  of  t  and  f*  evaluated  at  U  and  U*  respectively.  The  form 
(1)  first  appeared  in  Tierney  and  Kadane  (1986)  who  noted  that  when  log  g  =  0(n)  this 
approximation  is  then  accurate  to  order  n  ^ . 

Tierney,  Kass  and  Kadane  (1989a)  suggest  extending  (1)  to  handle  nonpositive  f 
by  approximating  the  moment  generating  function  of  f,  E(exp  (sf)).  and  then 
differentiating  at  s=0.  Alternatively  we  may  add  a  large  constant  c  to  f  such  that 
c-ff>0,  apply  (1)  to  c+f,  and  then  subtract  c  from  the  resulting  approximation.  If  c 
needs  to  be  infinite  we  regain  the  approximation  based  on  the  moment  generating  function. 

Let  hg  =  E(f(U)  [  Ur,r/s).  If  the  joint  density  of  the  Ur,r^s  is  proportional  to  a 
known  function  say  gs(Ur,  r=l,-  •  -k,  r^s)  then,  since  E(f)=E(hs),  Eff)  = 


i 


) 


where  is,  etc.  are  defined  analogous  to  (1).  We  see  that  |2)  replaces  ihe  approximation 
of  a  ratio  of  k-fold  integrals  with  the  approximation  of  a  ratio  of  (k-1  '-fold  inteerais  ana 
thus  should  provide  a  better  approximation  than  1 1).  Hierarchical  Bayes  models  provide  a 
class  of  probability  structures  where  approximation  (2)  can  usually  be  carried  out  (see  Kass 
Steffey,  1989). 

3.  Gibbs  Sampling 

3.1  Review  of  the  Gibbs  Sampler 

In  the  sequel  we  assume  the  existence  of  densities  with  respect  to  either  Lebesque  or 
counting  measure  as  appropriate.  Densities  will  be  denoted,  generically,  by  square  brackets 
so  that  joint,  conditional  and  marginal  forms  appear,  respectively,  as  [U,V)  [U|  V]  and  |V]. 
The  usual  marginalization  by  integration  procedure  will  be  denoted  by  forms  such  as 

[U!  =  /(U|vi-|v| 

We  shall  require  that  our  collections  of  random  variables  are  such  that  specification  of  all 
full  conditional  distributions  uniquely  determines  the  full  joint  density  (see  Besag.  1974). 
.More  precisely,  for  such  a  collection  of  random  variables  I’l,  U^,-  •  -  -I'k.  the  joint  density. 
[Ui,  is  assumed  uniquely  determined  by  Us|Ur.  r^s],  s=1.2.  --.k.  Our 

interest  is  in  the  marginal  distributions,  [Us,],  s=1.2,-  •  -k. 

An  algorithm  for  extracting  marginal  distributions  from  the  full  conditional 
distribution  was  formally  introduced  as  the  Gibbs  sampler  in  Geman  and  Geman  (1984). 
The  algorithm  requires  all  the  full  conditional  distributions  to  be  "available"  for  sampling, 
where  "available"  is  taken  to  mean  that,  for  example.  Us  can  be  generated 
straightforwardly  and  efficiently  given  specified  values  of  the  conditioning  variables,  Ur, 
r^s.  We  return  to  this  matter  at  the  end  of  this  section. 


fi 


Gibbs  sampling  is  a  Markovian  updating  scheme  which  proceeds  as  toiiows.  Given 


an  arbitrary  starting  set  of  values 
then  Uj”  from  [U2IU1**,  •  •  and  so  on  up  to 


we  draw  I’,*  from  ' L'l  i  ]. 

from 


.I  k 1 1  1'  to  complete  one  iteration  of  the  scheme,  .\fter  t  such  iterations  we 


would  arrive  at  Geman  and  Geman  show  under  mild  conditions  that 

(U*/’,- • (Ui,---,Uk)  -  [Ui,  U2,*‘*Uk]  as  t-^.  Hence  for  t  large  enough 


rUI 


U 


( t) 


for  example  will  be  regarded  as  a  simulated  observation  from  [Us]-  Replication  of 


the  process  m  times  yields  m  i id  k— tuples  j=l,- •  •  .m.  Diagnostics  to 

assess  convergence  are  critical.  Some  tentative  discussion  appears  in  Gelfand  et  al  (1989) 
but  a  fuller  account  is  deferred  to  a  future  paper.  Vote  that  sample  size  at  say  the  t‘^ 
iteration  may  be  increased  from  m  to  any  specified  size  by  sampling  with  replacement 
from  the  vectors  (Uj-’,-  •  j=l,*  •  -m. 


Application  in  the  Bayesian  framework  takes  the  Ug  to  be  unobservable 
representing  either  parameters  or  missing  data.  All  distributions  will  be  viewed  as 
conditional  on  the  observed  data.  [Ui,*  •  -Uk]  becomes  the  joint  posterior  density  whose 
form  is  therefore  known  modulo  normalizing  constant.  There  is  no  question  as  to  whether 
the  full  conditional  densities  uniquely  determine  the  Joint  density:  the  full  conditional 
densities  will  have  been  obtained  from  the  form  of  the  joint  density  functions.  Functions. 
f(Ui,-  •  -  .Uk),  whose  density  and  expectation  we  seek  will  arise  as  interesting  functions  of 
the  parameters. 

We  conclude  this  section  with  a  remark  regarding  the  required  generation  from  full 
conditional  distributions.  In  a  hierarcbical  Bayesian  model  the  full  conditional 
distributions  take  reduced  forms  (see  Gelfand  and  Smith.  1990,  Section  3.2).  In  particular 
if  conjugacy  is  assumed  for  the  distribution  at  a  given  pair  of  adjacent  stages  say  i.i-f  1,  in 
the  hierarchy  then  the  full  posterior  at  stage  i+1  is  an  updated  version  of  the  prior  at 
stage  i+1  whence  sampling  is  usually  straightforward.  If  conjugacy  is  not  assumed  then 
the  full  posterior  for  any  parameter  will  still  be  known  modulo  normalizing  constant.  In 


this  case  more  sophisiicaied  random  generation  using  for  example  the  ratio  of  uniforms 
method  (see  e.g.  Devroye.  1986)  still  enables  sampling  from  the  full  posterior. 


3.2  Calculating  Expectations 

Since  the  iid  vectors  are  approximately  distributed  as  [Ui,-  •  •  ,Ukl 

we  can  in  principle  create  sample  based  estimates  of  the  marginal  densities  of  any  subset  of 
the  U's  or  indeed  of  any  transformation  of  the  U's  using  kernel  density  estimators  (see  e.g. 
Silverman,  1986).  In  the  same  spirit  an  obvious  sample  based  estimator  of  the  expected 
value  of  an  integrable  function  f(Ui,*  • -.Uk)  takes  the  form 


j=l 


(3) 


We  now  show  that  in  all  such  enterprise  it  behooves  us  to  take  advantage,  when 
possible,  of  availability  of  the  full  conditional  densities  to  improve  our  estimation.  As  in 
the  previous  section,  for  any  s,  E  f(Ut,- •  ^Uk)  =  E(hs)  where  hg  =  E(f|  Ur,  r^s).  But 
var(f)  >  var(hs).  Hence  the  estimator 


m 

E  hs(U‘‘  ,  r^s)/m  (4) 

j=l 

is  better  than  (3)  in  terms  of  mean  squared  error.  In  fact  the  "Rao— Blackwellized" 
estimator  (4)  is  better  than  (3)  under  more  general  loss  functions  (see  e.g.  Ferguson  1967. 
p.  121).  We  typically  take  advantage  of  this  Rao-Blackwellization  by  using  m  smaller 
than  would  be  required  under  (3).  We  note  that  (3)  and  (4)  are  sampling  analogues  of  (1) 
and  (2)  respectively  and  that  (2)  may  be  viewed  as  a  Rao-Blackwellized  version  of  (1). 
We  also  note  that  the  sampling  based  approach  does  not  require  smoothness  or 
differentiability  for  f  and  hg  allowing  application  to  for  example  f=max  Uj. 


With  regard  to  density  estimation  again  if  I's  appears  as  an  argument  of  i  the 
ronditional  density  f!Ur,  r^sj  can  be  obtained  by  univariate  transformation  irom  il  ,1  r- 
r^sj.  The  resulting  Rao— Blackwellized  sample— based  density  estimate  oi  f  would  be 


m 


(f|U‘J',  r^sj/m 


(•^) 


Note  that  we  need  not  modify  the  Gibbs  sampler  so  as  to  sample  f's  in  order  to 
calculate  E(f)  or  to  estimate  [f|.  Note  further  that  the  forms  (3),  (4)  and  (5)  are  invariant 
under  1—1  transformation  of  the  individual  I'i.  Hence,  in  practical  calculation  ot 
expectations  using  the  Gibbs  sampler,  transformation  of  the  parameters  is  not  an  issue  as  it 
is  for  the  Laplace  method. 

By  using  the  full  conditional  distribution  of  each  variable  which  actually  appears  as 
an  argument  of  f,  the  form  (4)  may,  in  principle,  be  used  to  obtain  several  estimators 
which  improve  upon  (3).  Similarly,  several  density  estimators  of  the  form  (5)  can  be 
obtained.  While  adaptive  combination  of  such  estimators  might  be  attempted  (say  by 
approximating  the  variance  of  each  estimator  through  the  delta— method  or  through  a 
sample  reuse  method)  we  suggest  the  simple  unweighted  average.  The  variance 
components  example  in  Section  4.2  provides  an  illustration. 

.4  related  point  involves  the  use  of  reduced  conditional  distributions  and 
expectations  when  available  to  improve  estimation.  We  make  the  argument  in  its  simplest 


form  taking  k=3.  To  calculate  E(U,)  suppose  both  h(U2,U3)=E(U,  |  U2.L3)  and 
W(U2)=E(U,  [Uj)  are  known  in  closed  form.  Since  E(hjU2)=EW,  Rao— Blackwellization 

shows  that  m  *  E  WfUjj*)  is  better  than  m  ’  H  h(U2j-\U3j’).  This  point  concurs  with 
j=l  j=l 

our  intuition  regarding  .Monte  Carlo  integration  for  a  fixed  number  of  independent  points. 


m  —  we  can  approximate  a  single  integral  better  than  a  double  integral.  This  situation 
typically  arises  in  the  context  of  missing  data.  For  instance  in  the  aggregated  multinomial 


'1 


KKampie  of  seciion  4.!  the  reduced  conditional  distributions  i0|\'.Z;  ana  i//|V.Z',  are 
::nmediatelv  unsealed  Beta  distributions. 


A  Conditional  Gibbs  Sampler 

The  Gibbs  sampler  can  be  extended  to  yield  samples  from  arbitrary  conditional 
distributions  as  follows.  Suppose  we  fix  a  subset  of  the  Ur's,  without  loss  of  generality  say 

”’^k  respectively.  In  implementing  the  Gibbs  sampler  suppose  a 
complete  iteration  is  achieved  by  updating  only  Ui,---.U,j/.  That  is.  starting  with 
(Uj  )  we  draw  b,  -[bilUo  =  =  - 

Uk/*p---Uk  =  Uk]  ■•••  Uv’  -  (UkMUl‘\---Ui':,,  =  U;,,„.---i;k  =  Ok].  as  the 

number  of  such  iterations,  t-w,  (Ui,---U,j,)  -  [Ui, •  •  •  .U|,/ 1  = 

Uk/*i,- •  ■  Uk  =  Uk].  Repeating  this  process  m  times  yields  (U5|\- • j=l.- • -m  an 
iid  sample  having  approximately  this  conditional  distribution. 

Estimation  would  then  proceed  as  in  Section  3.2.  For  instance  a  Rao— Blackweilized 
density  estimate  for  [U2IU1  =  ui]  would  be  developed  holding  Ui  fixed  at  ui  in  the 
Gibbs  sampler  and  would  take  the  form 


=  u.j  =  [U2|U,=u,.  U,=u;U  r>2]/m  iG! 

j=l 


.X  Rao— Blackweilized  estimate  for  E(U2iUi=ui)  would  take  the  form 


S  E(U2| Ui=ui,  Ur=Url  .  r>2)/m 

j=l 


10 


'1.  Illustrative  Examples 

In  this  section  we  illustrate  the  methodology  of  Section  3  in  the  context  n  ihree 
examples  chosen  to  reflect  different  types  of  "awkward"  structure  where  the  Gibbs  sampler 
provides  an  easily  implemented  solution. 

4.1.  An  Aggregated  Multinomial  Model 

Gelfand  and  Smith  (1990)  discuss  a  fictitious  data  set  where  some  observations  are 
not  assigned  to  individual  cells  but  to  aggregates  of  cells  (see  e.g.  Hartley.  1958;  Dempster. 
Laird  L  Rubin  1977,  Tanner  i:  Wong,  1987).  In  fact  suppose  the  data  V=(Yi,-  •  -  .Ys)  = 

( 14,1,1,1,5)  are  available  as  a  sample  from  the  multinomial  distribution 

Mult  (22;  I  d  +  g,  I  I  ^  7/  + 

and  that,  as  a  prior  for  we  take  Dirichlet  (1,1,1).  By  considering  instead  a  "split 
cell"  multinomial  of  the  form 

X=(Xi,-  •  •  ,X7)  -  Mult(22;  |  g,  |  |  t/,  |  r/,  |,  2(l-^-»7)) 

we  view  Xi,  X5  as  missing  data  and  construct  a  Gibbs  sampler  involvine;  0.  rj  and  Z=(X,, 
.\5).  The  required  full  conditional  distributions  are 

[e\Y,j],Z]  =  (l-r?)  Be(X,  ^  Y2+I,  Y5+I)  (8) 

[r/l  Y,0,Z]  =  (l-<?)  Be  (Y3  +  X5+I.  Y5+I)  (9) 

and 

[ZlYAv]  =  [X.,  X5I Y,<?,7/]  =  Bi  (Y,,  2^1+2<?)'')  •  Bi  (Y4,  277(3+277)  ') 

Using  "exact"  numerical  methods  (Naylor  and  Smith,  1982)  we  obtain 
E[0|Y]  =  .5199,  £[77!  Y]  =  .1232.  The  Tierney-Kadane  approximation  (1)  on  the  (^,77) 
scale  yields  E(^|Y)  a  .5175  and  E(77|Y)  a  .0882  resulting  in  relative  errors  of  .5%  and 


2S.4%  respectively.  I'nder  an  alternative  parameterization  (for  o.xampie  the  losit  i.> 

considered  in  Achcar  and  Smith,  1989)  we  mighi  fare  better  with  E(  tji  \  1. 

For  the  sampling  based  approaches  in  addition  we  considered  F(^  iV).  E(t;'  iV). 

var(^l'i')  and  var(7/j\'i.  a  total  of  six  expectations.  Simulation  is  required  to  study  the 

performance  of  these  approaches.  We  obtained  5000  repetitions  of  the  Gibbs  sampler 

where  for  each  repetition  we  took  only  10  iterations  and  set  m  to  be  only  20.  For  each 

repetition  we  calculated  estimates  of  the  six  expectations  using  the  form  (3)  and  using  the 

form  (4),  for  the  later  employing  (8)  or  (9)  as  appropriate.  For  instance,  for  E(^|Y)  the 

2  0 

estimator  of  the  form  (3)  is  -  while  the  estimator  of  the  form  i,4)  employing 

J=1 


2  0 
(8)  is  E 

j=l 

var(^|Y) 


dj/20  where  dj  =  (1  -  +  Y2  +  l)(X|j°’  +  ^'2  +  Y5  +  2)  For 

2  0 

the  estimator  of  the  form  (3)  is  E  -  ^**®)^/20  where  is  the 

j=l 


average  of  the  ‘j®.  But  since 

var(^|Y)  =  E(var(<?|Y,?7,Z))  +  var  E{0\Y.rj,Z)  (10) 

we  can  use  var(^|  Y,r7,Z)  from  (8)  to  obtain  an  estimator  of  the  form  (4)  for  the  first  term 
on  the  right  hand  side  of  (10).  For  the  second  term  on  the  right  hand  side  we  use  the 
estimator  E(dj  -  d)^/19  where  d  =  Sdj /20. 

For  both  of  the  estimators  of  each  of  the  six  e.xpectations  over  the  5000  replications 
we  calculated  the  average,  the  average  bias,  the  variance  and  the  mean  square  error.  In  all 
cases  bias  was  inconsequential  in  the  sense  that  mean  square  error  agrees  with  variance  to 
at  least  three  significant  places.  Hence  in  Table  1  we  present,  for  each  expectation  the 
"exact"  value,  the  average  value  using  the  form  (3)  estimator  with  associated  standard 
error  in  parentheses  and  the  average  value  using  form  (4)  estimator  with  associated 
standard  error  in  parentheses.  The  performance  of  the  Gibbs  sampler  is  remarkably  good 
especially  given  such  small  t  and  m.  In  practice,  by  the  last  iteration,  it  would  typically 
be  >1000  yielding  SE's  at  most  one-tenth  of  those  shown.  As  it  is,  for,  say  E[r?|Y]  using 


the  estimator  of  form  (4)  virtually  all  estimates  fell  in  the  interval  (.1_’2S.  .12361.  .Vote 
also  the  benefit  of  "Rao-Blackwellizine;".  In  all  cases  the  standard  error  of  the  form  i21 
estimators  is  at  most  5S%  of  the  corresponding  form  ( 1)  estimator.  Thus  the  MSE's  of  the 
Rao  Blackwellized  estimators  are  at  most  35%  their  non-conditional  counterparts. 

[INSERT  TABLE  1  HERE] 

4.2  A  Variance  Components  Model 

Box  and  Tiao  (1973,  Chapter  5)  present  a  set  of  dyestuff  data  wherein  five  samples 
of  six  randomly  chosen  batches  of  raw  material  were  taken  and  a  single  laboratory 
determination  of  product  yield  was  made  for  each  of  the  resulting  30  samples.  If  Yjj 
denotes  the  yield  of  the  jih  sample  from  the  ith  batch,  i=i,-'-.6,  then  we 

define  a  variance  components  model  for  Yij  by  Yjj  =  +  Cij  where,  assuming 

conditional  independence  throughout,  [0j  [fiji<Tg]  =  N(0,(Tg).  Given  0; 

and  (Tg,  for  the  ith  batch  Yi  and  Sj  are  sufficient  and  thus  we  summarize  the  data  as 

Batch  1  2  3  4  5  6 

Y.  1505  1528  1564  1498  1600  1470 

Sj  63.05  33.28  37.98  68.70  50.00  31.02 

The  usual  ANOVA  estimate  of  Oq  is  1764.05,  of  is  2451.25.  Let  Y  denote  {Yij}, 
let  0=(0i,- •  •  ,02)  and  assume  that  and  <7^  are  independent  with  priors 

[n\  =  N(/<o,<ro)^,  [oq]  =  IG(ai,bi)  and  [<r^|  =  IG(a2,b2)  where  IG  denotes  the  inverse 
Gamma  distribution  arid  /Jo,<7o,ai,bi,a2,b2  are  assumed  known.  For  illustrative  purposes 
we  take  the  rather  vague  specifications  /io=0,  (7o=10‘^,  ai=0,  bi=l,  a2=0,b2=l.  The 
Gibbs  sampler  involves  OQ,a^,  n  and  0.  The  required  full  conditional  distributions  are 
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(70  j  =  IG  (3.5.  1000  -f-  1 .5)l!(0i-/i)' )  111 

1,7;  I  V./i.0.4l  =  IG  (15.  -  ')!:(Vi-0i)-’  i  '  :2i 

[0|Y,/i,<T0,jp]  =  N(((7^+5<T0)'‘(5<70Y  +  tTpl).  {a\->roa\)  'll 

where  \’'=(Yi,-  •  ^Ye),  1  is  a  6*1  column  of  I's  and  1  is  a  6*6  identity  matrix. 

Interest  typically  focuses  on  the  distributions  j<70|Yj.  [^Tg|Y]  and  (<70.ap|\'].  The 
marginal  distribution  distributions  are  straightforwardly  handled  using  estimates  of  the 
form  (5)  and  have  been  created  for  a  different  and  in  fact  more  difficult  data  set  in  Gelfand 
et  al  (1989).  With  regard  to  expectations  we  consider  E((T0|Y).  E((7p|Y)  and 
Using  "exact"  numerical  methods  (Naylor  and  Smith.  1982)  these 
expectations  are  2496.6,  2879.4,  and  1.019  respectively.  Using  the  Tierney-Kadane 
approximations  (1)  on  the  ((7e,<70)  scale  yields  very  poor  estimates.  E((70|5')  =  537S.5. 
E((7p]Y)  =  6393.9.  and  E(a\la\)  =  1.044.  .Achcar  and  Smith  (1989)  report  a  less  severe 
but  similar  finding  and  recommend  a  log  transformation  of  the  variance  components.  1  In 
fact,  the  exact  answers  above  were  obtained  working  with  log  scales).  Unfortunately  on 
the  log  scale  the  results  are  not  much  better:  Efa^jY)  =  1679.7,  E(apl\')  =  3669.0  and 
E(<70/<7e|V)  =  2.959. 

In  addressing  the  performance  of  the  Gibbs  sampler  we  focused  on  E(a0/(7‘ 1 5  ). 
Using  a  simulation  based  on  5000  repetitions  with  i=60,  m=50  at  each  repetition,  we 
examined  four  estimators: 

50  2(30) 

-  estimator  1  is  of  the  form  (3)  i.e.  (  E 

j=l 


'50  lOOO+It0j 

-estimator  2  is  of  the  form  (4)  usin^  (11)  i.e.  i  E  - -  /50 

J  =  1  -’.-CT,. 


‘  ‘J  1  c/  0  j  j 

-estimator  3  is  of  the  form  (4)  using  (12)  i.e.  i  E - /50 

lj  =  l  4E5^  + 

-  estimator  4  is  the  unweighted  average  of  estimators  2  and  3. 

The  brief  Table  2  reports  the  exact  value  along  with  the  average  value  and  standard 
error  for  each  of  the  four  estimators.  The  performance  of  the  Gibbs  sampler  is  quite  good 
especially  since  m  is  still  much  smaller  than  we  would  use  in  practice.  While,  after  the 
fact,  estimator  3  performs  better  than  estimator  2  (one  had  to  be  better  than  the  other!),  in 
the  absence  of  such  knowledge  estimator  4  seems  a  satisfactory  choice.  ,\s  a  matter  of 
record  this  simulation  yielded  for  E(<T9|  Y)  an  average  value  of  2500.1  with  standard  error 
322.8,  for  E(crg|Y)  an  average  value  of  2877.8  with  standard  error  85.7.  .Again,  in 
practice,  a  much  larger  m  would  be  used.  As  noted  in  Section  3  transformations  need  not 
be  considered. 


INSERT  TABLE  2  HERE 

.Note  that  if  interest  was  in  the  intra-class  correlation  coefficient.  we 

may  straightforwardly  obtain  Rao— Blackwellized  density  estimates  for  [<T0/(<T0+<T^)i  V) 
using  (5).  However  in  calculating  E((T0/(fr0+(7p)l  Y)  only  an  estimator  of  the  form  (3)  is 
readily  available. 

4.3  A  Normal-linear  Hierarchical  Model 

.A  widely  used  version  of  the  normal-linear  hierarchical  model  introduced  by 
Lindley  and  Smith  (1972)  takes  the  following  general  form.  Data  on  the  i‘^  of  k 


1') 


individuals  is  modeled  by  [Yjl^pCr"']  =  N(Xj  .a*  I^),  with  the  individual  parameters  0^ 
themselves  modeled  by  conditional  independence  tor  bpine; 

assumed  throughout.  The  Bayesian  modeling  is  then  completed  by  assuming, 
independently,  that  [E’‘]=W({pR)  Vp)  and  [(7^]=lG(U'^.^Pf,  ).  Here.  W 

and  IG  denote  Wishart  and  inverse-gamma  distributions,  respectively.  Typical 
applications  of  such  a  model  are  to  population  random  effects  studies  (see.  for  example. 
Racine— Poon  and  Smith,  1989). 

In  such  cases,  the  (Yj  |  describe  Uj  individual  measurements  (e.g.  growth  over 
time/response  to  dose),  the  k  regression  coefficients  characterize  the  individual  growth 
or  response  pattern  and  the  (^j  l/i,  2,)  reflect  the  fact  that  the  k  individuals  are  a  random 
sample  from  a  population.  A  variety  of  Bayesian  and  empirical  Bayesian  procedures  have 
been  proposed  for  making  inferences  about  ,  /i,  £.  but  exact  calculations  have  hitherto 
proved  infeasible,  largely  due  to  the  presence  of  the  unknown  population  covariance  matrix 
S.  Discussion  and  illustration  of  inferences  and  predictions  for  such  hieraurchical  models  is 
given  in  Gelfand  et.  al.  (1989).  Here  we  simply  indicate  how  the  Gibbs  sampler  provides  a 


straightforward  and  easily  implemented  approach  to  the  general  problem  of  estimating  E. 

k  k 

Defining  Y={Y,  ■  ,YJ,  ,■  ■  ■  .0^^),  '0=k'  E  9-.  n=  E  n,,  D;=-T-^X^Xi  - 

i  =  1  i  =  1 

XT'.  V=(kE'*+C  *)  '.  the  Gibbs  sampler  is  easily  seen  to  take  the  form  (suppressing 
explicit  dependence  on  r;,C,p,R,i'o,ro) 

=  N'(D,(<7  ’xTY,+!rV),Di)  (i=l.  ..,k) 

[/i|Y,«,r'.u*i  =  N(v{kr''J+c‘i,),v) 

p' I Y, =  VV(|I(«i-,i)(*iV+/’Rr'.  k+p)  (13) 

i 

Iff’  I  Y.ftM,ir'|  =  IG(«n+v„),  tRY,  -Xi  I,  )^(  Y,  -X,  )+^„  rj], 

i 

Simulation  from  the  N  and  IG  distributions  is  immediate;  simulation  from  the  W 


distribution  is  straightforwardly  accomplished  using  the  algorithm  of  Odell  and  Feiveson 
(1966).  Estimation  of  E  (via  E^‘)  using  the  form  (4)  is  directly  achieved  using  the  mean  of 
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(13).  .-^3  an  indication  of  the  computing  effort  required,  for  an  example  with  k=30.  nj=.6 
and  Xj  0.^  having  the  form  of  a  straight-line  growth  curve  (so  that  involves  a  total 

of  66  parameters),  satisfactory  convergence  (checked  by  empirical  Q-Q  plots  for  the 
eigenvalues  of  IT')  was  obtained  at  t=25  with  m=100. 


5.  Discussion 

Our  general  discussion  and  the  three  illustrative  examples  reveals  the  ease  with 

which  posterior  expectations  can  be  calculated  using  the  Gibbs  sampler,  even  with 

"awkward"  raodel/data  combinations.  In  particular,  we  note  the  striking  advantage  of  the 

approach  in  overcoming  problems  due  to:  awkward  posterior  surfaces,  otherwise  requiring 

subtle  and  sophisticated  numerical  or  analytic  approximation  tech.niques;  functions 

which  may  not  be  continuous  much  less  dif ferentiablej  dimensionality  pro^blems 

arising  from  highly  parameterized  models^  intractable  distributions  arising 
from  missing  data;  "difficult"  objects,  such  as  covariance  matrices,  whose 
component  parameters  implicitly  involve  complicated  constraints. 

Most  attractively,  we  need  not  redo  the  analysis  for  each  desired  expectation.  .Ml 
such  expectations  are  readily  estimated  from  the  final  sample. 
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Table  1;  Expectations  for  the  Aggregated  Multinomial  Model 

Average  Value  of  Form  (3) 

Est  i  mator  (S .  E. ) 


Elxpectation 

Exact  Value 

my) 

.1232 

E(02|Y) 

.2881 

m"\y) 

.2174x10* 

Var(0|Y) 

.1776x10* 

Var(7,lY) 

.6561x10'^ 

.1229  (.0178) 

.2879  (.0302) 
.2158x10'^  (.0639x10*) 
.1781x10'*  (.0551x10*) 
.6476x10'^  (.2732x10'^) 


Average  V a  lue  of  Form  (1) 
Est  i  mator  (S .  E.  ) 


.1232  ( 

.0082) 

.2883 

(.0171) 

2173x10'*  i 

.0280x10'*) 

.1774x10'*  1 

.  .0218x10  * 

.6556x10  ^  1 

( .0902x10'^ 

Table  2:  The  Expected  Variance  Ratio  for  the  Variance  Components  Model 


Exact 

Estimator  1 

Estimator2 

Estimator  3 

Estimator 

Value 

(SE) 

(SE) 

(SE) 

(SE) 

1.019 

1.095 

1.086 

1.028 

1.057 

(.098) 

(.091) 

(.022) 

(.039) 
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